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SUMMARY 


An  analysis  of  trends  in  predictive  validity  coefficients  across  time 
and  repeated  performance  assessments  shows  highly  significant  and  consistent 
trends  in  validities  as  a  function  of  time  and/or  interpolated  practice. 
Commonly  used  ability  measures  show  decreasing  predictive  validities  for  the 
prediction  of  temporally  more  remote  performance  assessments.  Within  study 
corrections  for  differential  restrictions  of  range  and  attenuation  due  to 
unreliability  across  the  different  performance  assessments  increased  the 
negative  slopes  of  the  regressions  of  predictive  validity  on  time  or  ordinal 
position  of  performance  assessment.  The  median  validity  decrement  from 
initial  to  final  performance  assessment,  corrected  for  differential  range 
restriction,  attenuation,  and  within  study  sampling  fluctuations  was  -.29. 
The  mean  of  the  trimmed  distribution  of  corrected  validity  decrements,  after 
eliminating  the  two  most  extreme  cases,  was  -.115.  The  average  within  study 
correlation  between  predictive  validity  and  time  or  ordinal  position  of 
performance  assessment  was  -.80.  A  similar  analysis  of  stability 
coefficients  of  time  period-by-time  period  or  trial-by-trial  performance 
assessment  correlations  revealed  very  similar  albeit  slightly  more 
consistent  findings.  Theoretical  explanations  stressing  the  dynamic  nature 
of  human  abilities,  the  changing  nature  of  abilities  required  for  task 
performance,  and  social  competition  factors  are  discussed  as  reasons  for  the 
predictive  validity  decrements. 
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PREFACE 


This  technical  paper  contains  a  meta-analysis  of  empirical  articles 
containing  data  relevant  to  questions  about  temporal  declines  in  predictive 
validities  and  declines  in  relationships  between  assessments  of  skilled 
performance.  This  meta-analysis  establishes  the  framework  within  which 
ongoing  studies  being  conducted  at  the  University  of  Illinois  of  the 
validity  of  explanations  for  the  observed  declines  can  be  interpreted.  This 
meta-analysis  confirms  the  generality  of  the  phenomenon  across  a  wide 
variety  of  skiiled  and  cognitive  performance  areas.  With  one  possible 
exception  of  performance  in  law  school,  there  appear  to  be  no  performance 
areas  immune  to  these  declines  in  predictive  validities  and  performance 
stabilities  across  time  and  repeated  trials.  The  magnitude  of  the  decline 
varies  with  initial  predictive  validity  and  the  length  of  the  study,  but  its 
magnitude  is  sufficient,  over  time,  to  raise  serious  questions  about  the 
benefits  of  using  selection  tests.  Current  investigations  of  reasons  for 
these  validity  and  stability  declines  may  provide  theoretical  explanations 
for  these  powerful  effects. 

The  authors  thank  James  Austin,  Kathy  Hanisch,  Lloyd  Humphreys,  and 
Mary  Roznowski  for  comments  on  earlier  drafts  of  this  manuscript.  Their 
comments  improved  and  strengthened  the  analyses  and  interpretation  of  the 
results.  This  study  was  supported  in  part  by  Contract  #F336 15-87-C-00 1^4 
from  Brooks  AFB,  Texas.  The  views  of  the  authors  do  not  necessarily 
represent  those  of  the  Air  Force  Human  Resources  Laboratory. 

Requests  for  reprints  may  be  sent  to  Dr.  Charles  L.  Hulin,  Department 
of  Psychology,  603  East  Daniel,  Champaign,  1L  61820. 
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ADDING  A  DIMENSION: 

TIME  AS  A  FACTOR  IN  THE 
GENERALIZABILITY  OF  PREDICTIVE  RELATIONSHIPS 

I.  INTRODUCTION 

Studies  of  the  predictions  of  future  performance  from  current  abilities 
have  typically  ignored  the  time  facet  in  prediction  equations.  With  the 
exception  of  Alvares  and  Hulin  (1972),  Fleishman  (I960),  Humphreys  (1968), 
and  Humphreys  and  Taber  (1973),  the  goals  of  most  investigators  have  been  to 
establish  generalizability  across  populations  of  individuals,  abilities 
(used  as  predictors),  tasks,  and  situations.  Most  analyses  of  the 
generalizability  of  predictive  relations  have  examined  whether  variance  in 
predictive  validities  across  elements  of  these  four  facets  or  populations 
can  be  attributed  to  statistical  artifacts  or  to  real  differences  in 
predictive  relationships  (Hunter,  Schmidt,  &  Jackson,  1982;  Schmidt  4 
Hunter,  1977;  Schmidt,  Hunter,  4  Caplan,  1981). 

A  narrative  review  by  Henry  and  Hulin  (1987)  of  the  literature  relevant 
to  the  stability  of  predictive  validities  across  time  suggested  that  most 
empirical  predictive  validities  were  less  stable  than  has  commonly  been 
assumed  and  that  the  instability  was  general  across  content  areas  (Henry  4 
Hulin,  1987).  They  reported  that  temporally  decreasing  predictive 
validities  have  been  found  in  most  areas  of  skilled  performance. 

Psychomotor  skills  such  as  discriminant  reaction  time  (Fleishman  4  Hempel, 
1954,  1955),  two  dimensional  tracking  (Dunham,  1974),  rotary  pursuit 
(Fleishman,  I960),  two-handed  coordination  (Fleishman  4  Rich,  1963),  and 
student  pilot  performance  during  training  (Alvares  4  Hulin,  1973)  were 
typically  found  to  have  decreasing  predictive  validities  or  decreasing 
intertrial  correlations.  Studies  of  the  predictive  validities  for  academic 
performance  in  college  (Humphreys,  1968;  Humphreys  4  Taber,  1973)  and 
graduate  school  (Lin  4  Humphreys,  1977)  also  reported  systematically 
changing  predictive  validities  when  evaluated  against  performance  assessed 
at  different  stages  of  learning  or  performance.  The  time  periods  examined 
in  such  studies  have  ranged  from  one-  or  2-hour  experiments  (Dunham,  1974; 
Fleishman,  I960),  to  performance  across  15  weeks  of  flight  training  (Alvares 
&  Hulin,  1973),  to  performance  of  engineers  across  20  years  (Brenner  4 
Lockwood,  1965),  to  performance  of  scientists  across  five  decades  (Dennis, 
1954,  1956). 

Studies  of  growth  and  development  in  the  area  of  human  intelligence  are 
also  relevant.  Many  of  these  studies  have  found  evidence  that  Henry  and 
Hulin  (1987)  argued  supports  an  interpretation  of  generally  decreasing 
predictive  validities  (Anderson,  1939;  Humphreys  4  Davey,  1984).  Ackerman 
(1989)  has  challenged  the  conclusions  of  Henry  and  Hulin  (1987)  and  the 
previous  conclusions  of  Alvares  and  Hulin  (1972,  1973)  about  the  ubiquity  of 
decreasing  predictive  validities. 

The  purpose  of  this  analysis  and  article  is  to  determine  if  predictive 
validities  in  general  vary  systematically  as  a  function  of  time,  stage  of 
practice,  or  length  of  time  on  a  job.  Specifically,  we  are  concerned 
whether  temporally  more  remote  performance  assessments  may  be  less  strongly 


1 


related  than  temporally  close  performance  to  abilities  assessed  before 
performance  or  training  and  used  as  predictors  of  future  performance. 

Validity  Generalization 

We  shall  not  review  general  validity  evidence  provided  by  primary 
empirical  studies  and  subsequent  metd-analyses  of  these  studies.  Meta¬ 
analyses  of  empirical  studies  of  predictive  validities  of  ability  measures 
predicting  performance  have  been  carried  out  by,  Schmidt  and  Hunter  (1977), 
and  their  colleagues,  (Pearlman,  Schmidt,  4  Hunter,  1980;  Schmidt,  Gast- 
Kosenberg,  4  Hunter,  1980;  Schmidt  4  Hunter,  1977).  We  do  present,  however, 
a  brief  synopsis  of  the  past  research  in  this  area  to  establish  a  framework 
for  our  analyses  aihd  interpretations. 

Individuals 


The  primary  conclusion  from  past  work  on  validity  generalization  is 
that  validity  estimates  generalize  across  sub-populations  of  individuals, 
abilities/tasks,  dhd  situations.  Schmidt  and  Hunter  (1981)  claim: 
"Professionally  developed  cognitive  ability  tests  are  valid  predictors  of 
performance  on  thd  Job  and  in  training  for  all  settings"  (1981,  p.  1128). 
Analyses  by  Drasgow  (1982)  and  Drasgow  and  Kang  (1984),  however,  have  raised 
questions  about  the  power  of  most  analyses  of  differential  validity  across 
sub-populations  of  individuals. 

Tests  and  Tasks 


Investigates  of  differential  validity,  the  variance  in  validities  of 
a  given  ability  measure  for  different  criterion  tasks,  have  examined 
correlations  between  many  combinations  of  ability  tests  and  criterion  task 
performance.  Although  there  are  disagreements  about  the  appropriate 
conclusion,  the  results  indicate  that  there  are  small  but  statistically 
reliable  differential  validities  for  some  tests  and  task  combinations. 
Measures  of  cleridal/scholastic  ability  correlate  more  strongly  with 
performance  measures  from  a  job  family  composed  of  clerical  tasks  than  they 
do  with  performance  measures  in  a  mechanical  Job  family  (Humphreys,  1979). 
Conversely,  ability  measures  based  on  mechanical/practical  tests  (Humphreys, 
1979;  Thurstone,  1938;  Vernon,  1950)  correlate  more  strongly  with 
performance  measures  on  mechanical  tasks  than  they  do  with  performance  in 
clerical  Jobs  (Humphreys,  1979). 

Aside  from  this  small  but  reliable  difference  in  predictive  validities 
between  certain  tdst-task  combinations,  there  is  little  evidence  for 
differential  validity  within  broad  job  families.  The  observed  differential 
validities  are  theoretically  important.  They  may  be,  however,  of  limited 
practical  utility;  A  test  of  general  cognitive  or  intellectual  ability  will 
usually  have  a  significant  preJictive  validity  for  early  performance  on  many 
Jobs  (Hunter,  Schmidt,  4  Jackson,  1982;  Schmidt,  Gast-Rosenberg,  4  Hunter, 
1980;  Schmidt  4  Hunter,  1977,  1981;  Schmidt,  Hunter,  4  Caplan,  1981). 
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Si  t’"  t  ions 


Situations,  the  final  facet  normally  considered  in  validity 
generalization  studies,  have  typically  been  investigated  by  studying 
validity  across  organizations  as  elements  of  a  population  of  situations. 

The  assumption  is  that  small  differences  in  situational  variables,  often 
instantiated  as  organizational  climate  (Schneider  &  Bartlett,  1968),  would 
moderate  test  validity.  Meta-analyses  have  demonstrated  that  the  variance 
in  observed  empirical  predictive  validities  across  situations  often  can  be 
accounted  for  by  three  sources  of  artifactual  variance;  sampling  variance, 
unreliabilities,  and  restriction  of  range  (Hunter,  Schmidt,  &  Jackson,  1982; 
Schmidt  &  Hunter,  1977,  1981).  After  correcting  for  these  three  artifacts, 
there  is  little  variance  in  empirical  validities  left  to  be  explained  by 
systematic  differences  among  the  elements  of  the  populations  of  settings  or 
si tuations. 

In  summary,  there  is  some  evidence  for  small  but  statistically  reliable 
differential  validities  for  some  tests  and  Job  families.  There  is  little 
evidence  for  variance  in  empirical  validities  across  subpopulations  of 
individuals,  although  the  power  of  most  analyses  to  detect  substantial 
amounts  of  measurement  bias  is  very  low.  There  is  also  little  evidence  for 
systematic  variance  of  validities  across  situations  or  organizations. 

Generalization  Across  Time 


Time  has  seldc  een  explored  as  a  source  of  systematic  variance  in 
test  validities.  Variance  of  predictive  validities  across  the  time  facet  is 
theoretically  and  practically  important;  it  addresses  important  questions 
related  both  to  the  stability  of  individual  differences  in  abilities  as  well 
as  dynamic  vs.  static  criterion  measures  (Austin,  Humphreys,  &  Hulin,  1989; 
Barrett,  Caldwell,  &  Alexander,  1985;  Ghiselli,  1956).  The  stability  of 
both  abilities  and  p  ■~f'ormance  has  implications  for  the  scientific  study  of 
human  behavior  that  .  ids  oeyond  the  immediate,  narrow  question  of 
predictive  validit  cralization  across  time.  These  implications  will  be 
:ddressed  in  the  d  'cusslon  section. 

Theoretical  Importa.  ,.e 

Variance  in  predictive  validity  across  time  is  theoretically  important 
for  the  study  of  individual  differences  in  human  abilities.  There  are  three 
possibilities  that  should  be  considered.  Predictive  validities  may  be 
constant,  within  the  limits  of  sampling  fluctuations,  across  time. 

Predictive  validities  may  vary  randomly  beyond  the  limits  of  sampling 
fluctuations.  Predictive  validities  may  vary  systematically  across  time 
showing  significant  linear  or  higher  order  temporal  trends. 

if  predictive  validities  are  constant,  initial  predictions  determined 
from  regression  equations  for  early  performance  may  be  used  for  predictions 
of  later  performance  and  provide  reasonable  bases  for  forecasting  very  long 
term  performance  and  ability.  If  predictive  validities  vary  randomly  across 
time,  beyond  the  limits  of  sampling  fluctuations,  then  there  may  be  no 
linear  temporal  factor  involved  in  the  variability.  Other  factors  such  as 
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unreliability  of  performance,  rapidly  changing  motivation,  or  social 
competition  factors  may  be  responsible  for  the  observed  fluctuations.  If, 
as  the  third  alternative  suggests,  predictive  validities  vary  systematically 
and  not  randomly  across  time,  then  human  abilities  may  also  change 
systematically.  Changes  in  rank  orders  of  individuals  would  be  more  likely 
than  stability  along  any  given  ability  dimension.  Such  a  dynamic  conception 
of  human  ability  is  not  new  (Alvares  4  Hulln,  1972,  1973;  Dunham, 

1974;  Humphreys,  1968).  This  dynamic  interpretation  of  human  abilities  has 
further  implications  for  the  definition  of  human  ability  and  for 
distinctions  between  human  abilities  on  the  one  hand  and  skills  and 
knowledge  on  the  other  (Ackerman,  1989;  Henry  &  Hulin,  1987,  1989).  This  is 
a  fundamental  issue  in  this  area;  definitions  and  implicit  assumptions  about 
human  abilities  determine  many  of  the  conclusions  about  the  data  reviewed  in 
this  article. 

Validities  that  change  systematically  across  time  perhaps  should  lead 
us  to  question  assumptions  we  make  about  intellectual,  cognitive,  or 
psychomotor  abilities  defined  as  fixed  capacities.  Whatever  the  theoretical 
basis  cf  assumptions  about  fixed  capacities--genetic  determinants  or  events 
during  critical  periods  of  development— the  assumptions  and  the  theories  may 
need  revising.  This  fundamental  assumption  of  human  abilities  needs  to  be 
made  explicit  and  its  implications  examined  empirically  whenever  possible. 
Humphreys  (1985)  and  others  (e.g.,  Wesman,  1956)  have  suggested  that 
abilities  are  neither  fixed  nor  are  they  capacities;  to  define  them  in  that 
manner  makes  little  sense  theoretically  or  psychometrically. 

Dynamic  criteria  represent  the  other  side  of  the  function  linking 
indi *  idual  differences  in  abilities  to  individual  differences  in 
performance.  Just  as  we  often  make  assumptions  about  the  stability  and 
generality  of  human  abilities,  we  make  parallel  assumptions  about  the 
stability  and  specificity  of  skilled  performance  (Rothe,  1946a,  1946b,  1947, 
1951,  1970,  1978;  Rothe  4  Nye,  1958,  1959).  These  assumptions  may  also  need 
to  be  reexamined.  That  is,  rank  orders  of  individuals  in  terms  of  skilled 
performance,  even  after  group  means  and  variances  have  stabilized,  may  be 
less  constant  than  is  commonly  assumed. 

If  rank  orders  of  individuals  in  terms  of  levels  of  skilled  performance 
change  systematically,  a  conceptualization  of  criterion  performance  in  which 
the  amounts  of  the  abilities  require'1  for  performance  on  the  criterion  task 
change  systematically  as  a  function  of  practice  on  the  task  is  also 
possible.  This  second  view  of  skilled  performance  has  been  offered 
previously  as  an  explanation  for  changing  decreasing  predictive  validities 
(cf.  Fleishman  4  Hempel,  1954). 

Practical  Importance 

Temporal  trends  in  predictive  validities  are  also  of  practical 
importance.  Estimates  of  the  utility  of  testing  and  selection  programs  are 
often  based  on  extrapolations  from  the  validities  of  tests  for  predicting 
performance  during  training,  or  early  in  an  individual's  working  career.  If 
predictive  validities  vary  systematically  across  time,  then  extrapolating 
utility  estimates  beyond  the  initial  observation  periods  may  lead  to  serious 


errors.  Periodic  retesting  of  individuals'  abili  y  levels  to  update  the 
information  in  prediction  equations  and  to  generate  new  predictions  of 
performance  on  the  basis  of  periodic  ability  assessments  may  need  to  become 
a  standard  part  of  personnel  selection  programs.  As  an  alternative,  we  may 
need  to  recognize  that  our  ability  to  predict  long  term  performance  is  very 
limited;  more  modest  claims  for  utility  or  predictive  validities  may  be 
needed . 

In  summary,  a  more  thorough  understanding  of  predictive  validity  should 
include  investigations  of  change  across  time  and  practice  on  the  task,  as 
well  as  differences  across  subpopulations  of  individuals,  abilities, 
tasks/ jobs,  and  situations.  This  relative  lack  of  emphasis  on  the  time 
facet  should  be  rectified  if  we  are  to  develop  dynamic  models  of  ability- 
performance  relationships. 


Goals  of  the  Study 

The  goals  of  this  study  are  to  examine  temporal  trends  in  predictive 
validity  coefficients  within  studies  and  to  accumulate  estimates  of  these 
temporal  trends  across  studies.  This  is  done  by  examining  trends  in  the 
validities  of  tests  for  predicting  individual  differences  in  criteria  at 
different  stages  of  practice  or  performance  within  each  study.  After 
examining  the  temporal  validity  trends  within  each  study  and  correcting  for 
relevant  statistical  artifacts,  the  results  are  combined  across  studies. 
General  and  consistent  temporal  trends  in  predictive  validities  across 
studies  would  suggest  restrictions  on  the  generalizability  of  predictive 
validities.  Evidence  of  long  term  as  well  as  short  term  validity  of  tests 
as  predictors  of  performance  is  needed  for  complete  statements  about 
validity  and  utility. 

A  second  category  of  studies  was  included  in  addition  to  the  set  of 
studies  reporting  predictive  validities  as  normally  defined.  These  studies 
investigated  the  stability  of  ability  or  performance  measures  across  time. 
The  relevant  data  from  such  growth  and  development  studies  are  usually 
presented  as  a  time  period-by-time  period  or  trial -by-trial  matrix  of 
performance  intercorrelations.  The  elements  of  the  vector  defined  by  the 
first  row  of  such  a  matrix  represent  the  validity  of  performance  on  the 
first  trial,  or  during  the  first  time  period,  for  predicting  performance 
during  the  2nd,  3rd,  and  subsequent  n  -  1  trials  or  time  periods  of  the 
task.  As  such,  it  is  analogous  to  a  validity  sequence  extracted  from  the 
usual  predictive  validity  studies  using  ability  measures  to  predict 
performance  during  sequential  trials  or  in  sequential  time  periods.  We  do 
not  claim  that  performance  during  the  first  trial  or  first  time  period  on  a 
skilled  task  or  an  ability  assessment  is  identical  to  the  usual  ability 
measures  used  as  predictors  of  skilled  performance.  We  do  argue  that 
distinctions  between  first  trial  performance  measures  and  a  job  sample  taken 
before  hiring  and  used  as  a  predictor  of  job  performance  are  more  apparent 
than  real.  We  maintained  the  distinction  between  the  typical  predictive 
validity  studies  and  the  growth  and  development  studies  by  analyzing  and 
reporting  the  results  separately. 
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In  one  important  aspect,  the  analysis  reported  in  this  article  is 
substantially  different  from  standard  meta-analyses.  Because  the  time  facet 
is  ordered  and  linear,  at  least  within  the  limits  of  the  studies  reviewed, 
the  validity  coefficients  obtained  in  any  study  can  be  ordered  along  this 
dimension;  time  provides  both  the  facet  and  the  metric  for  ordering  the 
observations  for  trend  analyses.  This  characteristic  of  time  enables  us  to 
go  beyond  simply  estimating  the  variance  in  validity  coefficients  due  to 
time  or  practice  on  the  task.  We  can  order  the  obtained  predictive  validity 
coefficients  and  test  for  systematic  temporal  trends.  There  is  no 
compelling  reason  to  assume  only  linear  temporal  trends  in  predictive 
validities,  but  there  are  normally  not  enough  observations  within  any  one 
study  to  estimate  any  higher  order  trends.  Therefore,  our  study  is  limited 
to  the  examination  of  simple  linear  trends. 

Artifact  Corrections 


Corrections  will  be  made  for  the  subset  of  the  possible  artifactual 
influences  that  can  affect  observed  trends  in  predictive  validities  within 
each  study.  After  making  these  corrections,  the  within  study  temporal 
trends  will  be  accumulated  across  studies. 

If  there  were  differential  reliabilities  of  performance  measures  across 
the  time  intervals  or  observations  within  a  study,  we  corrected  the  observed 
validity  coefficients  for  differential  attenuation.  This  was  necessary  in 
order  to  estimate  the  temporal  trend  in  predictive  validity  coefficients 
within  each  study  unconfounded  by  systematic  trends  that  might  exist  in 
performance  reliability. 

We  also  corrected  the  observed  validity  coefficients  for  differential 
range  restriction  across  performance  assessments  within  studies. 

Differential  range  restriction  across  observations,  specifically  decreases 
in  variance  across  observations,  has  been  suggested  as  an  explanation  for 
observed  decreasing  validities  across  time  (Barrett  et  al  ,  1985). 

Correcting  for  differential  range  restriction  will  allow  an  investigation  of 
this  hypothesis. 

This  use  of  correction  for  range  restrictions  is  somewhat  different 
than  the  usual  use  of  such  corrections.  Normally,  corrections  for  range 
restrictions  are  for  the  purposes  of  estimating  population  validities  from 
sample  validities  where  the  sample  may  be  more  or  less  variable  than  the 
population  to  which  one  wants  to  generalize.  Differential  range  restriction 
across  samples  can  introduce  artifactual  variance  in  sample  validity 
coefficients  (Hunter  et  al.,  1982).  Such  artifactual  variance  must  be 
removed  in  secondary  analyses  testing  hypotheses  about  situational  variance 
in  validity. 

In  this  study,  we  are  not  concerned  about  generalizations  to 
populations  of  individuals;  those  meta-analytic  studies  have  been 
conducted.  We  are  concerned  about'  artifactual  influences  on  variance  across 
performance  assessments.  If  there  are  ceiling  effects  on  performance  that 
become  more  restrictive  as  the  sample  of  individuals  acquires  more  skill 
across  the  different  assessments  in  a  study,  then  the  variance  in 


6 


performance  will  be  artifactually  restricted  across  time.  Predictive 
validities  will  appear  to  be  lower  for  later  assessments.  Floor  effects 
during  early  performance  that  become  less  serious  as  practice  or  performance 
continues  may  lead  to  increasing  variance  and  artifactually  increasing 
validities.  Misleading  conclusions  may  be  reached  about  differential 
validity  when  the  appropriate  conclusions  should  be  about  differential  range 
restrictions  across  trials  or  performance  assessments.  We  corrected 
validity  coefficients  within  studies  to  reflect  differential  range 
restrictions  reflected  by  differential  variance  of  performance  across  time. 
Our  intent  was  to  obtain  estimates  of  predictive  trends  within  studies  that 
wore  not  influenced  by  such  artifacts.  The  details  of  the  corrections  for 
range  restriction  and  unreliability,  as  well  as  our  methods  for  obtaining 
the  studies  in  our  sample  are  described  in  the  method  section  below. 

II.  METHOD 

Our  search  of  the  literature  on  ability-performance  relations  spanned 
the  areas  of  prediction  of  performance  as  well  as  growth  and  development 
research.  Included  in  the  performance  prediction  domain  were  experimental 
studies  as  well  as  studies  of  academic  performance.  The  growth  and 
development  research  included  any  longitudinal  investigations  of 
ability/performance  in  which  intertrial  correlations  were  reported.  Many  of 
these  were  studies  of  intellectual  abilities.  Overall  41  articles  were 
collected  yielding  77  independent  validity  sequences. 

Data  Collection  Procedures 


The  collection  of  relevant  empirical  studies  began  with  a  search  for 
review  articles  in  the  various  subareas  mentioned  previously.  Those  used 
included  Ackerman  (1987),  Adams  (1987),  Alvares  and  Hulin  (1972),  Barrett, 
Caldwell,  and  Alexander  (1985),  Guion  and  Gibson  (1988),  and  Henry  and 
Hulin,  (1987).  Because  the  scope  and  focus  of  these  articles  varied,  many 
empirical  studies  cited  in  these  articles  were  not  relevant  to  our 
investigation,  and  some  that  appeared  relevant  according  to  the  reviewing 
authors'  descriptions  did  not  include  the  necessary  information  for  use  in 
our  analyses.  Several  articles  not  included  contained  relevant  data  but 
presented  results  only  in  the  form  of  a  graph  of  predictive  validity  or 
stability  coefficients  against  time  or  the  trial's  ordinal  position  (e.g., 
Stelmach,  1969).  To  be  included  we  would  have  had  to  estimate  the  validity 
or  stability  coefficients  by  extrapolating  from  the  graph.  Because  most  of 
these  studies  were  cummulative,  rather  than  uniquely  informative,  they  were 
not  included. 

Potential  studies  were  examined  to  assess  their  appropriateness  for 
secondary  analyses.  Two  conditions  were  necessary  for  inclusion:  (a) 
longitudinal  correlational  analyses  (not  cross-sectional),  and  (b)  at  least 
three  correlations  between  ability  and  performance  at  different  times 
representing  predictive  validity  coefficients.  Cross-sectional  studies  were 
not  included  because  investigations  of  predictive  validities  across  time 
necessitate  the  use  of  longitudinal  designs.  Longitudinal  designs  are 
required  because  of  the  nature  of  some  of  the  hypotheses  concerning  changing 
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validities  (e.g.,  that  rank  orders  of  individuals  change  in  terms  of 
abilities).  If  cross-sectional  samples  are  used,  the  effects  of  changing 
rank  orders  of  individuals  cannot  be  investigated. 

Studies  were  not  included  if  only  two  validities  were  reported  (e.g. 
Adams,  1957).  Others  were  omitted  because  only  factor  loadings  of 
performance  on  extracted  dimensions  were  given.  Of  the  studies  included, 
some  reported  multiple  sequences  of  validities  using  a  different  predictor 
for  each  sequence.  These  were  included  as  separate  validity  sequences  in 
our  analysis.  However,  if  multiple  sequences  of  validities  were  based  on 
subsets  of  the  total  sample,  say,  males  and  females,  only  the  total  sample 
validity  sequence  was  used  (if  reported). 

Although  the  search  was  systematic  and  extensive,  there  are  undoubtedly 
studies  that  were  not  located.  If  these  unlocated  and  unanalyzed  studies 
are  systematically  different  in  terms  of  temporal  trends  in  validity  then 
the  conclusions  reported  here  may  be  more  general  than  the  data  warrant.  It 
is  unlikely,  however,  that  our  study  is  plagued  by  "file  drawer"  problems 
(Rosenthal,  1979);  there  should  be  no  systematic  effect  on  the  publication 
of  studies  in  which  predictive  validities  are  stable,  increase  or  decrease 
systematically,  or  vary  widely  but  randomly.  Temporal  trends  in  validity 
coefficients  have  rarely  been  the  main  topic  of  interest  in  most  studies  of 
predictive  validity.  Positive,  negative,  or  even  zero  trends  in  predictive 
validities,  by  themselves,  should  not  directly  influence  decisions  by 
investigators  to  submit  manuscripts  for  publication  or  by  editors  to  accept 
or  reject  these  manuscripts. 


Statistical  Analyses 

The  first  phase  of  the  analyses  consisted  of  plotting  the  observed 
predictive  validities  against  time.  For  this  analysis,  time  was  treated 
simply  as  an  ordinal  variable.  Regression  lines  were  fitted  and  the  slopes 
calculated  for  the  within  study  regression  of  predictive  validity  or 
stability  coefficients  on  time.  For  evidence  of  non-linearity,  all  such 
plots  were  examined  visually  since  most  studies  did  not  include  a  sufficient 
number  of  data  points  to  permit  statistical  analyses.  There  was  little  non¬ 
linearity  evident  in  these  plots.  In  addition,  an  index  of  validity  change 
across  observations  within  studies  was  calculated  by  computing  the 
difference  between  the  two  endpoints  of  the  regression  line.  These  are  the 
predicted  validities  that  corresponded  to  the  first  and  last  observations  in 
the  sequence.  This  difference  represents  the  amount  of  decrease  (negative 
Ar)  or  increase  (positive  Ar)  in  validity  as  a  function  of  time  and  practice 
on  the  task.  We  used  the  difference  between  the  predicted  validities 
corresponding  to  the  first  and  last  points  on  the  regression  line,  Ar, 
rather  than  the  raw  difference  between  the  first  and  last  coefficients,  Ar, 
to  remove  as  much  as  possible  the  effects  of  within  study  fluctuations  in 
the  validity  or  stability  sequence  caused  by  sampling  variance. 

In  the  second  phase,  each  observed  predictive  validity  estimate  was 
corrected  for  range  restriction,  unreliability,  and  if  possible,  for  both. 
Changes  in  standard  deviations  of  performance  across  the  assessments  in 
each  study  were  used  to  correct  for  differential  range  restriction. 
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Standard  deviations  or  variances  were  reported  in  35?  of  the  studies.  If 
not  reported,  no  correction  was  made.  Corrections  for  differential  range 
restriction  were  made  by  first  calculating  an  average  standard  deviation 
across  trials,  weighted  by  sample  size.  This  weighted  average  was  then  used 
in  the  formula  for  correcting  for  range  restriction. 

These  corrected  validities  were  also  regressed  on  time  and  the  changes 
in  the  corrected  predictive#validities  corresponding  to  the  first  and  last 
observations  calculated  (Ar  ;  *  indicates  that  the  predictive  validities 
have  been  corrected  for  any  of  the  statistical  artifacts  that  were  possible 
to  correct  for  given  the  available  data).  Differences  in  the  changes  in  the 
regressed  uncorrected  and  the  corrected  validities  onto  time  reflect  the 
effects  of  statistical  artifacts  on  changes^in  the  validities  across  time. 
The  changes  in  the  corrected  validities,  Ar  ,  where  available,  were  used  in 
all  subsequent  analyses.  If  reliabilities  were  not  included  (as  was  the 
case  90?  of  the  time)  correlations  between  adjacent  trials  were  used  to 
estimate  reliability  in  the  correction  formula.  The  square  roots  of  these 
correlations  were  used  in  the  denominator  of  the  Spearman-Brown  formula.  If 
corrections  for  range  restriction  had  been  made  in  the  previous  step,  these 
corrected  correlations  were  used  in  the  numerator  of  the  Spearman-Brown 
formula  to  correct  for  unreliability.  If  the  correction  for  range 
restriction  could  not  be  made,  the  uncorrected  correlations  were  used.  If 
neither  reliabilities  nor  adjacent  trial  correlations  were  included,  a 
correction  for  unreliability  was  not  made. 

A  final  index  of  change  in  validity  sequences  as  a  function  of  time  was 
computed  by  correlating  validity  (corrected  for  unreliability  and  range 
restriction,  if  provided)  with  the  ordinal  time  variable  within  each  study. 

The  third  phase  of  the  analysis  consisted  of  combining  within  study 
temporal  trends  in  validity  coefficients  across  studies  to  provide  an 
overall  estimate  of  the  trends  in  predictive  validities.  This  analysis  was 
not  si  i  .1  ight forward.  Choosing  a  reasonable  metric  to  represent  time  that 
was  both  sensitive  to  small  time  differences  within  studies  and  also  made 
sense  across  studies  is  difficult;  the  time  periods  ranged  from  indices  of 
scientific  productivity  across  decades  to  several  1  or  2  minute  measures  of 
performance  on  psychomotor  tasks  across  a  1-hour  experiment.  We  used  two 
different  representations  of  time:  a  natural  log  transformation  of  time  and 
a  simple  ordinal  time-metric.  The  former  may  assign  unrealistically  large 
values  to  later  observations  in  a  very  long  term  longitudinal  study;  the 
latter  discards  information  by  providing  only  rank  order  values  and  may 
assign  unrealistically  small  values#to  later  observations  in  long-term 
longitudinal  studies.  We  report  Ar  regressed  on  both  rank-ordered  time  and 
the  natural  log  of  time. 


III.  RESULTS 

Table  1  presents  the  summaries  of  our  predictive  validity  results.  The 
columns  from  left  to  right  represent:  (a)  the  number  of  observations  or 
data  points  for  each  validity  sequence,  (b)  the  amount  of  time  elapsed 
during  the  period  of  data  collection,  (c)  N  =  the  number  of  subjects,  (g) 

Ara  =  the  decrement  in  validity  corrected  for  range  restriction,  (e)  Ar  = 
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the  decrement  in  validity  corrected  for  unreliability,  (f)  Ar  =  the 
decrement  in  validity  corrected  for  both  sources  of  artifactual  variance, 

(g)  the  correlation  between  time  or  assessment  period  and  validity,  (h)  time 
elapsed  in  hours,  and  (i)  the  natural  log  transformation  of  time  elapsed. 

Of  the  prediction  studies,  82$  (44  of  5*0  showed  decreasing  validity 
patterns  as  measured  by  the  change  in  validity  (uncorrected)  as  calculated 
from  the  regression  of  validity  against  time  (Ar).  When  the  validities  were 
corrected  for  range  restriction  (Ara),  the  decrease  in  Ar  became  even  more 
pronounced  47$  of  the  time  (7  of  15).  Similarly,  when  the  observed 
validities  were  corrected  for  unreliability  (Ar),  the  decrease  in  Ar  was 
stronger  81$  of  the  time  (25  of  31).  Correcting  for  both  statistical 
artifacts  (Ar)  yielded  a  more  negative  Ar  in  86$  of  the  cases  in  which  it 
was  possible  to  correct  for  both  artifacts  (12  of  14).  Overall,  only  10 
validity  sequences  yielded  positive  or  zero  uncorrected  Ar's;  none  of  the 
validity  sequences  yielded  positive  or  zero  Ar's  when  corrections  were  made 
for  both  unreliability  and  range  restriction.  The  average  corrected  and 
uncorrected  Ar  are  given  at  the  bottom  of  Table  1.  The  average  correlation 
between  corrected  predictive  validities  making  up  the  validity  sequence  and 
the  temporal  rank  order  of  the  observation  was  -.80  (r.  to  z  transformation 
weighted  by  the  number  of  observations).  This  average  correlation 
represents  the  degree  of  within  study  correlation  between  the  ordinal 
position  of  the  observation  within  the  study  and  the  predictive  validity  of 
the  test  being  used  to  forecast  task  performance.  Both  the  size  of  this 
correlation  and  a  perusal  of  Table  1  suggest  a  great  deal  of  consistency  in 
the  relation  between  temporal  position  of  performance  and  the  predictive 
validity  of  tests  across  a  variety  of  tasks,  populations,  and  situation. 

The  average  decrements  in  the  validity  coefficients  range  from  -.15 
when  no  corrections  were  made  to  -.60  when  corrections  could  be  made  for 
both  differential  range  restrictions  and  attenuation  within  studies.  The 
90$  confidence  intervals  for  those  decrements  that  could  be  corrected  for  at 
least  one  of  the  potential  statistical  artifacts  never  included  zero;  none 
of  the  individual  values  of  the  decrements  in  corrected  validity 
coefficients  were  zero  or  positive.  The  value  calculated  for  the  average 
corrected  validity  decrement,  -.60,  may  not  represent  the  best  measure  of 
central  tendency  of  distribution  of  corrected  validity  decrements  because  of 
one  extreme  value,  -1.21.  The  median  of  the  distrubution  of  corrected 
validity  decrements  is  -.29;  the  mean  of  the  distribution  after  discarding 
the  two  most  extreme  values,  -1.21  and  -.10,  is  -.45.  Either  of  these 
latter  estimates  of  central  tendency,  although  somewhat  discrepant,  probably 
represents  a  more  accurate  summary  measure  of  the  central  tendency  of  the 
distribution.  The  mean  of  the  trimmed  distribution,  -.45,  is  more 
consistent  with  the  overall  information  contained  in  this  analysis. 

The  average  within  study  correlation  (z-transformation,  weighted  by  the 
number  of  data  points  within  the  study)  between  the  time  of  the  performance 
assessment  and  the  validity  of  the  test  for  predicting  that  performance 
assessments  was  -.80.  This  correlation  is  highly  significant  and  attests  to 
the  consistency  and  significance  of  the  validity  decrement  across  time 
within  each  study. 
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Table  1 .  Summary  of  Predictive  Validity  Results 
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(predicting  Std.  Rudder 
Control) 

Standard  Rudder  Control  5  8  min.  356  -.07  -  -.12  -  -  .78  .13  -2.01 

Standard  Rudder  Control  4  8  min.  356  -.14  -  -  -  -  .96  .13  -2.01 

(6-Target  Rudder  Control) 


Table  1 .  (Continued) 


XJ 

«•** *» 

03 

CO 

in 

CO 

a> 

CO 

G 

CXI 

00 

E 

a 

3 

• 

in 

■H 

01 

o 

< — 

H 

H 

x: 

0) 

03 

B 


<D 
C  E 
O 

•H  P 
p 

01  c 

rH  03 
0)  0) 
G  2 
G  P 
O  0J 
U  XI 


XJ 

C 

01 


G 

<1 


X) 

G 

<3 


01 

g 

<3 


XJ 

o> 

P 

o 

0) 

G 

G 

o 

o 

c 

3 


CO 

c 

o 

•H 

P 

01 

> 

g 

0) 

to 

n 

O 


Os 

co 


t— 

co 


t*** 

CO 


CO 


os 

vO 


o 

in 


o  o 


CO  00 
CO  CO 


o  o 
o  o 

VO  vO 


CO  CO 


VO 

Os 


CO 

OS 


CO 

CO 


CO 

CVJ 


CO 

I 


OS 

in 


CO 

I 


os 

CXI 


o 

I 


CO 

CVJ 


c— 

r— 

I 


OS 

I 


CVJ 

CXI 


o 
» — 

I 


=T  CO 
OS  CO 


CO  o 
CO  CO 


in 
in  t- 


I  I 


I  I 
I  I 
I  I 


CVJ  =T 
I  I 


I  I 
I  I 
I  I 


o  o 
«-  <o 


I  I 


o  o 


o 

VO 

OS 


P 

t0 

03 


■=r 

•n 

OS  c 
«-  o 

•H 

~  p 


03 

B 

•H 

H 

c 

in  o 
in  *h 
os  p 
r-  o 
01 
-  qj 


>> 

p 


O  01 

3  E 


CO 

vO 

as  c 


x> 

o 


G  Cm 
a.  G 
03 
CO  CU 


O  O  O  O 


vO  00 
00  OS  OS  OS 


I  I  I  I 
I  I  I  I 
I  I  I  I 


I  t  I 
I  I  I 
I  I  I 


I  I  I  I 
I  I  I  I 
till 


cvj  cvj  o  in 
«-  co  co  in 


X3 

03 

<"-*N  QJ 

XI  Q. 
<D  CO 
03  \ 

a 


vO 


o 

o 

CNJ 


CO 

Os 


CO 

CVJ 


os 

I 


=r 

o 

c— 

^r 

o 

o 

=r  co 

c-  t—  co  co 

os 

Z 

CVJ 

Os 

v£> 

^=r 

os  co 

CVJ  CVJ  CVJ  CVJ 

CO 

CVJ 

CO 

T— " 

Cvj 

VO 

XJ 

CO 

#  ( 

. 

03 

c 

>> 

CO 

c 

to  co 

to 

CO 

•H 

01 

>s 

•r-i 

• 

• 

G  G 

•  *  •  • 

X 

0) 

Q. 

6 

XJ 

01 

B 

G 

g 

>>  >i 

G  G  G  G 

3 

E 

CO 

XJ 

x: 

JC 

x  x:  x:  x: 

>H 

H 

in 

vO 

CVJ 

O 

o  o 

CVJ 

<u 

1 

v£> 

CO 

T—  T— 

r-  t—  r- 

r— 

CO  CO  CO  CO 


co 


>s 

o 

a) 

>>  g 
o  3 
01  o 
g  o 

3  <X 

o 
o 


H 

E- 

o 

P 

01 

p 

cc 

•H 

to 

CO 

C 

(/) 

O 

<x 

0 

2 

UJ 

C 

UJ 

r- 

■P 

c 

Os 

3 

G 

N 

ol 

\ 

03 

P 

CJ 

0) 

04 

•H 

CU 

c 

2 

01 

03 

r— 

IX 

03 

XJ 

G 

XI 

G 

o 

•r-i 

2 

CO 

X 

XJ 

X 

o 

O 

P 

CO 

x: 

03 

3 

03 

3 

in 

vO 

3 

cc 

G 

UJ 

G 

U3 

■H 

h-i 

c 

•• 

H 

0 

O 

03 

CJ 

03 

O 

vO 

UJ 

OS 

to 

u. 

o 

2 

O 

2 

U> 

CC 

a> 

O 

2 

03 

p 

c- 

Q. 

O 

a 

O 

os 

6-* 

r-* 

X 

x 

o 

01 

•r-i 

■r-4 

►-H 

P 

•H 

Os 

CO 

«=c 

to 

<t 

<x 

G 

3 

-3 

u 

c 

<*> 

G 

P 

-J 

O 

04 

Q 

o 

•» 

CU 

b0 

■r-4 

O 

03 

2 

H 

4. 

p 

2 

2 

G 

2 

X 

2 

E 

2 

x: 

2 

H 

P 

p 

p 

P 

ex 

o 

<x 

cC 

»rt 

<X 

03 

<x 

•H 

p 

H 

H 

co 

to 

to 

to 

co 

UJ 

co 

•/“I 

x 

X 

X 

G 

X 

X 

G 

z 

01 

co 

01 

01 

2 

0) 

03 

03 

0) 

0 

cc 

XJ 

2 

03 

2 

G 

2 

a 

2 

a 

2 

■H 

03 

3 

G 

O 

p 

p 

p 

p 

2 

o 

0) 

CO 

P 

CO 

01 

CO 

E 

C^ 

co 

CO 

G 

c 

>-» 

C 

0) 

►-H 

03 

03 

03 

03 

d 

2 

X 

►— « 

o 

t— 4 

03 

4-H 

O 

i 

'r-i 

HH 

a> 

•H 

CC 

C 

> 

CC 

G 

G 

G 

G 

«J 

H 

CU 

UJ 

cc 

UJ 

-3 

UJ 

CJ 

U3 

a 

UJ 

<3; 

2 

2 

<£ 

O 

2 

04 

O4 

CU 

CU 

2 

MJ 

p 

P 

-3 

r-i 

UJ 

4—4 

0 

<T. 

u. 

u. 

U. 

a, 

lu 

2 

2 

2 

c 

o 


12 


o 

o  o 

o 

T3 

«■ — ■> 

•=r 

^r 

■=r 

QJ 

W 

o 

o  o 

o 

0)  W 

S- 

* 

•»  » 

E  a 

3 

in 

in  in 

in 

•H  CO 
E-i  <-H 
OJ 

O 

x 

co 

co  co 

co 

CO  CO  CO 
C\J  CVJ  CVJ 


o  o  o 
zr  =r 
oo  CO  CO 

00  oo  CO 

5s*  t*— 


03  C  U 

H  QJ 

0)  0)  -o 

U  3  C 

U  U  CO 

O  <D 

O  X 


ON  CM  vO 

o\  vo  in 


co 

t— 

vO 

ON 

CVJ 

cvi  cvj 

co 

(NJ  *- 

CM 

o 

o 

o  o  o 

r 

i  r 

1 

1 

1* 

•  *  • 

in 

in 

in 

r— 

CVJ 

o 

c-  o 

CO 

co 

Cvj 

o  o  o 

ON 

1  o 

1 

CO 

CVJ 

o  VO 

ON 

r-  r-  r- 

r-  r- 

^r 

vO 

in 

t- 

r~ 

• 

•  . 

* 

w 

w 

,  ,  , 

w 

w  w 

w 

>> 

>> 

WWW 

u 

(m 

u 

co 

co 

l*  l«  u 

>>  >> 

>> 

•o 

■o 

>»>»>» 

•=r 

■=r  -=r 

VO 

VO 

On  On  On 

co  a. 

oo 

00 

ON 

>« 

>H 

r-  TJ 

UJ  OJ 

UJ 

UJ 

C 

CC  XJ 

X 

cc 

•*  CO 

X  (0 

x  «3:  cu 

X  <x 

OO  X 

a. 

Qm  P-4  P< 

a,  a, 

UJ  1 

x  a 

x  a  u 

x  o 

Z  CM 

X 

X 

X 

O 

X 

X 

X 

X 

>>  u 

2  'H  Cm 
<£  fH  L.  . 

x  -H  a> 
u.  x  a* 

X  cc 

u: 


cvj  oj  cvj 

o  o  o 


o  o  o 

CVJ  cvj  CM 

co  co  cn 


zr  co  oo 

ON  ON  ON 


CVJ  ON  O 

in  ^  in 


vO  co  o 
in  in  ^r 


co  co 


O  T~*  CVJ 

in  ^  co 


c-  o 
co 

O  CM 

-=T 

I  I  C— 
O  CO  VO 

in  o 

CVJ  t— 


WWW 

U  U  U4 
>»>»>> 


C- 

c- 1- 

c*- 

in 

in 

CO  co  co 

ON  i 

r- 

00 

ON 

w 

^-s 

W  X 

s 

X 

Jm  c 

X 

a, 

w 

•» 

<d  a> 

x 

a 

0) 

X 

Q.  -»-> 

cO 

'■»— * 

ro 

x 

UJ 

CO  <0 

E 

c— 

2 

a.  Om 

0) 

ON 

c 

H 

c-  - 

00 

o 

H 

to 

bo  bO 

C-  E 

CO 

>H 

HH 

w 

c 

c  c 

ON  <u 

£m 

•» 

x 

X 

<D 

•H 

•H  *H 

«-  X 

0) 

cc 

<0 

U 

X 

X  X 

o 

> 

UJ 

c 

<* 

O 

o 

o  o 

o  <x, 

co 

ca 

•H 

O 

•H 

■H  ‘H 

CO  - 

vO 

vO 

<x 

TJ 

W 

*a 

0)  TJ  T3 

>♦  w 

ON  U 

ON 

H 

P 

Q 

cm  a> 

o  <u  a> 

UJ  o 

«-  c 

T— 

o 

UJ 

a> 

c-  *- 

C  Sm  1m 

CC  -H 

o  o 

Z 

E 

on  a* 

CO  Dm  Pm 

X  w 

o 

•* 

C—  o 

z 

cO 

*— 

E 

Si  >N 

2:  x  <o  co 

2)  Q.  *~H 


^  d;  <C  «3I 
Pm  Pm  Pm 

z  a  o  a 


Table  1 .  (Continued) 


*0 

Q)  tO 

c*  w  p 

6  o.  2 

•h  <C  O 


d> 

S  E 

O  ’H 

•ri  P  P 

P 

OJ  c 

H  0)  T3 

(D  0>  C 

P  3  rtJ 

P  P 
O  0) 
a  n 


co  co  co 
t-  t-  t- 


o  o  o 
co  oo  oo 


OO 

co 

OJ 

(Tv 

C- 

oo 

r- 

o 

r- 

o 

T~ 

• 

•* 

• 

• 

• 

o 

CM 

in 

r— 

vO 

O 

f* 

coco  vo 

U3  ONt- 


cn  co  cvi 
o  o  o 


i  i  i 

*  *  * 


CM  CM 
O  O  O 


o 

co  co  co 

o 

in 

in 

CO 

CO 

jt 

o 

o  o  o 

ON 

CO 

in 

in 

in 

00  00 

in 

CM  CM  (\J 

c- 

T-* 

CM 

CM 

r“ 

-v 

• 

• 

to 

a> 

c 

•  •  • 

c 

>> 

• 

• 

• 

to 

w  to 

to 

•H 

to  co  to 

•H 

o J 

to 

to 

to 

>> 

>>  >, 

CL 

e 

x  x  x 

6 

T3 

p 

p 

p 

OJ 

<t)  fl) 

<0 

3  3  3 

>> 

>» 

£ 

•o 

TJ  "O 

rH 

in 

O 

co 

a> 

CO 

in  in  in 

*- 

CO 

CO 

CO 

OO  00 

O' 

P 

to 

:o 

c 

X 

r— 

O  P 

*H 

H 

o 

to 

c 

0J  o 

O 

Q 

•H 

P 

p 

03 

-  o 

<D  p 

o 

0 

P 

O 

P 

0) 

H 

2  -H 

OS  OJ 

x: 

x: 

aJ 

3 

0) 

X 

<*  p 

G 

p 

►H  P 

o 

o 

c 

p 

a 

p 

Z  03 

C  *H 

hH  10 

►H  tO 

co 

co 

•H 

P 

E 

OJ 

■H 

SC  P 

O  XJ 

^  0) 

w  (U 

XJ 

<D 

•H 

CP 

3 

CO  C 

•H  P 

H 

H 

3 

3 

CM  P 

Q. 

P 

U3 

to 

►-H  0) 

P  O 

in 

in 

OJ 

OJ 

in  o 

a 

CO 

a 

rH 

H 

P 

UJ  *H 

OJ  o 

co  to 

co  to 

CM  -J 

CM  -J 

O'  o 

ca 

CO 

to 

o 

<x 

P  0  3 

-J  P 

c  a 

o>  to 

av  to 

CO 

CO 

«-  o 

VO  P 

av 

•H 

o 

Q 

o  r-  a, 

u,  o 

•H 

«-  a> 

*-  0) 

ON  CO 

O'  CO 

=r  3 

<3 

CL 

P  ov 

6  x 

c 

G 

r-  CM 

r-  CM 

-  X 

Ov 

CO 

•. 

o  *-  >> 

<#  rH 

•rH  <D 

•  >H 

•*  -H 

co  a> 

»-  P 

•» 

rH 

CO 

•H  P 

0J 

P  <H 

Z  X) 

2  XJ 

*•  *> 

**  •» 

Q  rH 

(1) 

CO 

OJ 

c 

as 

XJ  -  OJ 

0S  -H 

O  Q. 

UJ  OJ 

UJ  OJ 

CO  <C 

CO  H 

-J  a 

*  P 

UJ 

•H 

3 

o 

0)  UJ  P 

UJ  P 

tO  E 

CO  OJ 

to  a> 

as  a. 

05  cC 

O  E 

UJ  P 

-J 

P 

o 

a: 

P  —JO 

ls  a> 

H  O 

-J  p 

-j  p 

UJ  o 

UJ  to 

2  o 

X  3 

UJ 

<u 

p 

H 

0-*  m  it 

os  <c 

a  u 

S3  CO 

so  CO 

3  S3 

3?  -J 

>H  CJ 

H  OQ 

H 

(O 

CQ 

no 

o 

»:C 

•  t; 

ci: 

O 

O 

f  1 1 

O 

»«« 

vt; 

-* 

0< 

u-* 

a. 

0. 

a. 

a. 

CC 

> 

14 


Table  1 .  (Concluded) 


E 

vX  vO  vX  vX 

■m 

^  5T 

X 

•  •  *  • 

■«— * 

o  o  o  o 

C 

o< 

*o 

Q>  tO 

o  o  o  o 

4>  «  t* 

h  a  3 

3333 

*  CTS  O 

C-*  rH  X 

tn  in  in  in 

<D  ^ 

CO  CO  CO  co 

0) 

C  E 

o 

■h  y  u 

X 

n  c 

rH  <1)  X3 

oo  -=r  co  co 

04004 

a>  <u  c 

O'  CM  (\|  CO 

oo  4t  •-  in 

l*  3  03 

*  •  •  • 

U  x 

o  <u 

1  1 

i 

u  x 

a 

u 

< 

III! 

04004 

till 

lilt 

lO  <-  «- 

1 

x 

1 

s- 

till 

IC^OO*- 

<3 

I  1  1  1 

I  I  1  I 

(M  fO  CO 

1 

0J 

t- 

1  1  1  1 

cm  in  o  o  in 

<J 

till 

1  1  1  1 

in «-  c- 

i 

"O 

<v 

X 

ft 

53 

co  -=r  c\j  co 

in  on  *-  ^r 

S-  u 

OOOCM 

ST  ID 

<3  t- 

«  •  •  • 

O 

1  1 

\ 

a 

c 

3 

sT  rt  zr  ^ 

z 

t-  H  C*-  h 

T3 

a>  <u 

•  •  •  • 

E  CO 

co  to  to  co 

•H  Q. 

L.  J*  U  U 

H  <tl 

rH 

<u 

^T  ST  ST 

to 

CO 

c 

vO 

o 

O' 

•H 

X 

•> 

**  > 

X 

3 

3 

3 

3 

t. 

u 

a> 

X 

CO 

X 

X 

HH 

o 

X 

o 

<X  X 

z 

to 

<t 

rH  rH 

0) 

$  9 

to  to  *H 

• 

0  0 

O  OJ  c- 

a: 

X  X 

•*H  >H  X 

Cd 

o  o 

u  c 

X 

to  CO 

X  X  tO  LU 

o 

c  c  4> 

H 

r  •» 

W  Ul  'H  (h 

►M 

X)  <c  T3  X 

fc-  o 

a . 

u  u 

a>  a>  x 

O  r-t  O  *H 

>  >  c  t- 

LU 

* 

G  O  O  O 

•H  H  X  4> 

H 

r 

a>  o  a>  o 

XX  X 

«t  t. 

o 

X  X  X  X 

03  -H  o  B 

Q  O 

H 

a  o 

x  to  t-  3 

X 

H 

3  X  3  X 

0)  0)  O  4!  Z 

-  O 

o 

<4  (0 

XZ  X  cs3 

CO  '*H 

X 

—3  •»  —J  •* 

flj  rH 

x  T3 

X 

•  Hit-* 

t-  Cw  <H  <M  03 

O  0) 

Cil 

a>  •*  4)  «x 

0)  O  O  O  X 

X  U 

H 

L.  CO  L.  CO 

>  O 

H  a. 

z 

a.  J  a.  J 

■1  »  ■»  »  H 

X 

Mi 

<c 

3 

•H 

rH 

•M 

X 

<0 

•rH 

rH 

4> 

U 

C 

3 

•o 

c 

03 

c 

o 

•H 

X 

• 

o 

C 

»M 

o 

u 

•H 

X 

X 

CO 

o 

43 

•M 

£- 

u 

X 

4) 

to 

oo 

a> 

C 

t* 

03 

U 

0) 

bo 

rH 

c 

03 

03 

•<H 

S- 

• 

X 

>» 

c 

•H 

4-> 

4) 

03 

•M 

i- 

•H 

rH 

4) 

x 

•H 

Cm 

c 

X 

Cm 

0) 

03 

•H 

c* 

•H 

•a 

4) 

rH 

4) 

X 

Cm 

t- 

X 

•H 

C 

o 

"O 

3 

X 

t- 

t* 

o 

O 

o 

Cm 

Cm 

Cm 

T3 

T3 

13 

V 

4> 

4> 

X 

U 

X 

a 

O 

a 

03 

4) 

4) 

i+ 

3U 

c- 

L. 

L. 

u 

0 

O 

o 

a 

o 

o 

15 


abilities  hypothesized  to  be  stable  by  the  primary  investigators* 


Table  2  shows  the  summaries  of  the  intertrial  performance  studies  in 
which  the  same  performance  measure  was  assessed  across  trials  or  time 
periods.  These  studies  are  fewer  in  number,  but  the  results  provide  even 
stronger  evidence  for  the  trends  found  in  Table  1.  All  of  the  Ar's,  both 
corrected  and  uncorrected  are  negative.  The  correlations  between  time  and  £ 
are  also  in  every  case  negative.  As  with  the  prediction  studies,  correcting 
for  unreliability  or  range  restriction,  or  both  had  the  effect  of  making  the 
decrease  in  Ar  even  greater  in  22  of  the  23  cases.  The  average  correlation 
between  corrected  predictive  validities  and  time  was  -.94  (r  to  z 
transformation,  weighted  by  number  of  within  study  data  points). 
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Table  2.  Summary  of  Stability  of  Performance  Results 
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Table  2.  (Continued) 
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Tab  .  (Concluded) 
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IV.  DISCUSSION 


Previous  secondary  analyses  have  investigated  the  general izability  of 
validities  across  populations,  situations,  abilities,  and  tasks.  These 
analyses  have  concluded  that  observed  variance  in  test  validities  across 
these  populations  is  substantially  due  to  statistical  artifacts.  Some 
researchers  have  been  willing  to  argue  that  validities  generalize  across 
these  facets  almost  without  limit  (Schmidt  &  Hunter,  1977,  1981).  In 
contrast,  our  secondary  analysis  of  predictive  validities  across  time  has 
demonstrated  that  validities  should  not  be  generalized  across  this  facet  of 
validity.  Time,  a  relatively  unstudied  facet  in  validity  generalization 
research,  has  a  consistent  effect  on  predictive  validities.  Validities  vary 
across  time;  with  few  exceptions,  they  decrease  monotonically. 

This  secondary  analysis  began  with  a  specific  set  of  hypotheses  about 
the  nature  of  variance  in  predictive  validities.  We  were  not  concerned  with 
simply  estimating  variance  due  to  artifacts  or  design  features  of  the 
studies  in  our  sample.  We  were  able  to  formulate  specific  hypotheses  on  the 
basis  of  previous  summaries  of  the  literature  (fllvares  &  Hulin,  1972,  1973; 
Henry  &  Hulin,  1987).  These  hypotheses  addressed  the  nature  of  the  variance 
of  predictive  validities;  predictive  validities  should  decrease 
monotonically  with  time.  Failure  to  reject  the  null  form  of  this  hypothesis 
( i . e . ,  no  temporal,  monotonic  decrease)  is  more  informative  than  rejecting  a 
simpler  hypothesis  that  the  variance  in  predictive  validities  is  greater 
than  would  be  predicted  by  sampling  fluctuations,  differences  in 
reliabilities,  and  differences  in  variance  cf  performance  assessments. 

Past  researchers  who  have  discussed  decreasing  predictive  validities 
across  time  in  organizational  settings  have  attributed  the  observed 
decrement  to  statistical  artifacts  (Barrett  et  al.,  1985).  That  is, 
differential  range  restriction  and  unreliability  across  different  time 
periods  or  trials  were  the  putative  reasons  for  the  observed  decreases 
(Barrett  et  al.,  1985).  Among  the  studies  providing  information  necessary 
to  correct  for  either  or  both  of  these  statistical  artifacts,  84£  (27/32)  of 
the  prediction  studies  shown  in  Table  1  and  %%  (22/23)  of  the  stability, 
growth,  and  development  studies  shown  in  Table  2  revealed  that  predictive 
validities  decreased  more  when  corrected  for  these  artifacts  than  did  the 
uncorrected  validities.  None  of  the  studies  that  have  positive  slopes  of 
the  regression  of  predictive  validities  onto  time  contained  information 
necessary  to  correct  for  the  artifacts;  io  is  unknown  if  these  slopes  would 
remain  positive  if  the  validities  could  be  corrected  for  differential 
unreliability  and  range  restriction  across  performance  assessments. 

This  finding  of  greater  temporal  variance  following  corrections  for 
statistical  artifacts  stands  in  sharp  contrast  to  other  secondary  analyses 
of  variance  in  predictive  valiuKies  across  facets  or  populations.  These 
previous  analyses  have  found  observed  variance  in  predictive  validities  was 
generally  attributable  to  artifacts.  Our  analyses  revealed  that  removal  of 
the  statistical  artifacts  increased  the  negative  slopes  of  the  regressions 
of  validity  onto  time  and,  hence,  the  variance  of  predictive  validities 
accounted  for  by  time. 
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The  pervasiveness  of  this  systematic  decrease  in  vaiidities  can  be  seen 
by  reviewing  Tables  1  and  2.  Forty-four  of  the  54  validity  sequences 
included  in  Table  1  and  23  of  the  23  validity  sequences  in  Table  2  had 
negative  slopes  for  the  regressions  of  predictive  validity  onto  time.  The 
average  within  study  correlations  between  predictive  validity  and  the 
ordinal  position  of  the  performance  assessment  was  -.80  in  Table  1  and  -.94 
in  ’’'able  2.  The  number  of  observations  in  the  studies  ranges  from  3  to  22. 
The  durations  of  the  studies  are  from  8  minutes  to  nearly  22  years  among  the 
prediction  studies,  and  as  long  as  60  years  among  the  stability,  growth,  and 
development  studies.  The  types  of  abilities  investigated  range  from 
specific  and  narrow  (e.g.,  simple  reaction  time,  discriminant  reaction  time) 
to  broad  and  general  abilities  (e.g.,  general  intellectual  ability).  The 
performance  predicted  in  the  studies  ranged  from  the  specific  (Pursuit  rotor 
performance)  to  the  very  general  (flight  performance).  Populations  sampled 
covered  highly  selected  groups  in  terms  of  abilities  and  skills  being 
studied  (professional  baseball  players)  to  samples  from  student  populations. 
Laboratory  and  field  studies  were  both  well  represented.  There  were  few 
exceptions  to  the  observed  decreasing  trends  in  predictive  validities. 

The  one  striking  exception  to  the  trends  observed  in  the  data  in  Table 
1  is  found  in  a  series  of  studies  conducted  by  Powers  (1982)  and 
Winterbottom  et  al.,  (1963)  predicting  grades  in  law  schools  using 
undergraduate  grades  and  Law  School  Aptitude  Test  scores  as  the  predictors. 
These  studies  found  that  although  LSAT  validities  declined  consistently 
across  the  3  years  of  law  school,  the  validity  estimates  for  the 
undergraduate  grades  did  not  show  the  expected  validity  decrement.  Both  of 
these  trends,  the  negative  temporal  trend  in  LSAT  validities  and  the  zero  or 
slightly  positive  trends  in  the  predictive  validity  of  undergraduate  grades, 
were  consistent  across  more  than  20  different  law  schools.  The  difference 
in  the  validity  sequence  trends  suggests  the  zero  or  positive  slope  for  the 
validity  of  undergraduate  grades  cannot  be  attributed  to  criterion 
contamination  or  related  criterion  problems.  The  same  criterion  resulted  in 
opposite  trends  in  the  same  sample  of  law  schools. 

There  is  no  obvious  explanation  for  the  discrepant  trends  found  in  law 
school  grades  nor  is  there  any  obvious  explanation  for  the  difference  in  the 

trends  between  LSAT  scores  and  undergraduate  grades  as  predictors.  In  spite 

of  an  apparent  finding  of  generality  across  situations  reported  by  Schmidt 
and  Hunter  (1977,  1981),  law  schools  may  represent  a  significantly  different 
situation  for  temporal  generalizations. 

The  regression  of  Ar  on  the  number  of  observations  per  validity 
sequence  showed  that  across  studies,  decrements  in  validity  became  more 
pronounced  as  the  number  of  data  points  increased.  Across  the  prediction 
studies  in  Table  1,  this  regression  was  -.51;  across  the  studies  in  Table  2, 
this  regression  was  -.38. 

Other  deviations  from  the  overall  trends  in  Table  1  should  also  be 

noted.  Fleishman  and  Rich  (1963)  reported  an  increasing  correlation  between 

kinesthetic  sensitivity  and  psychomotor  performance.  This  increasing,  as 
opposed  to  a  decreasing,  correlation  was  predicted  by  these  authors  on  the 
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basis  of  a  conceptual  explanation  for  the  generally  observed  decreases  in 
predictive  validities  that  stressed  changes  in  abilities  required  for 
performance  as  a  function  of  practice  on  the  task. 

Hinrichs  (1970)  reported  generally  increasing  correlations  between 
pretest  measures  and  performance  across  different  trials  on  a  psychomotor 
task.  Although  one  of  these  increasing  validity  sequences  had  been 
predicted  by  Hinrichs,  the  extreme  amount  of  within  study  fluctuation  in 
predictive  validities  from  trial  to  trial  and  the  very  small  sample  size 
make  the  significance  of  the  trends  difficult  to  interpret. 

Three  additional  increasing  validity  sequences  were  reported  by  Kaufman 
(1972).  These  increasing  validity  sequences  involved  scientific  performance 
measures  including  papers  written  and  patent  disclosures.  Both  criterion 
contamination  and  situational  variance  may  partially  account  for  these 
discrepant  findings. 

In  general,  aside  from  the  undergraduate  grade  point  average  predicting 
law  school  grades  and  the  increasing  validity  of  a  measure  of  kinesthetic 
sensitivity  for  predicting  psychomotor  performance,  the  discrepancies  to  the 
observed  general  trends  in  predictive  validities  seem  to  represent  anomalies 
more  than  significant  departures  from  general  findings  that  need  to  be 
explained.  Replication  of  the  increasing  validity  sequence  for  psychomotor 
performance  needs  to  be  done.  If  the  increasing  trend  is  replicated,  it 
should  lend  support  to  an  explanation  of  changes  in  validities  being  caused 
by  changes  in  abilities  required  for  performance  on  the  task. 

We  have  not  attempted  to  weight  estimated  "effect"  sizes  by  sample 
sizes  for  each  study  to  obtain  an  expected  effect  size.  This  weighting 
procedure  is  Justified  when  the  effect  sizes  being  estimated  have  some 
meaning  when  applied  to  individuals.  That  is,  if  the  effects  represent  the 
expected  change  that  may  occur  in  an  individual  as  a  result  of  the 
experimental  manipulation  or  naturally  occurring  event,  such  weighting  and 
estimation  procedures  are  reasonable.  The  dependent  variables  analyzed  in 
this  secondary  analysis  were  correlations  and  changes  in  correlations  that 
have  a  meaning  for  a  study  or  for  a  group  as  an  undifferentiated  whole;  they 
have  no  meaning  in  this  context  when  disaggregated  to  individual  data. 

Although  we  analyzed  the  effects  of  time  on  validity,  we  do  not  imply 
that  time  per  se  was  the  causal  factor  in  the  observed  validity  decrements. 
Those  things  that  occur  while  individuals  are  learning  and  performing  jobs 
and  during  skill  acquisition  are  the  assumed  causal  agents.  Time  is 
necessary  to  allow  these  things  to  occur  and  is  a  convenient  metric  in  the 
absence  of  more  specific  indicators.  Studies  of  the  effects  of  the  specific 
events  indexed  by  time  are  the  obvious  next  steps  in  this  area. 

The  theoretical  and  practical  implications  of  these  findings  need  to  be 
addressed  in  detail  in  laboratory  and  field  studies.  In  this  paper,  given 
that  our  goals  were  to  establish  the  existence  and  form  of  any  temporal 
relationship  with  predictive  validities,  we  can  discuss  them  only  briefly 
(for  a  more  in-depth  treatment,  see  Alvares  &  Hulin,  1973;  Henry  &  Hulin, 
1987). 
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Theoretical  Implications 


Two  theoretical  explanations  for  the  observed  decrement  in  predictive 
validities  were  discussed  by  Alvares  and  Hulin  (1972,  1973).  The 
explanatory  power  of  the  two  explanations  were  compared  in  an  experimental 
study  of  pursuit  rotor  performance  by  Dunham  (197*1).  Briefly,  Fleishman 
(I960)  advanced  an  explanation  for  the  observed  decrement  that  stressed 
changes  in  the  combination  of  abilities  required  to  perform  the  task.  These 
hypothesized  changes  in  abilities  required  by  the  task  occur  as  a  result  of 
practice  and  increasing  task  proficiency.  Adams  (1957),  Alvares  and  Hulin 
(1972,  1973),  and  Bechtoldt  (I960,  1961)  have  discussed  flaws  in  the 
empirical  support  offered  by  Fleishman  (I960)  for  this  explanation  of 
validity  decrements. 

An  alternative  explanation  stressing  changes  in  individuals'  abilities, 
as  a  function  of  practice  on  tasks  requiring  those  abilities,  was  discussed 
in  detail  by  Alvares  and  Hulin  (1972,  1973).  This  explanation  explicitly 
rejects  an  assumption  of  fixed  abilities.  It  assumes  instead  that 
individuals'  ability  levels  change  as  a  function  of  complex  skill 
acquisition.  Abilities  have  been  defined  as  consisting  of  the  current 
repertoire  of  relevant  skills  and  knowledge  possessed  by  an  individual 
(Hulin  &  Humphreys,  1980;  Humphreys,  1985;  Wesman,  1956)  rather  than  fixed 
capacities.  Individuals'  ability  levels  are  assumed  to  undergo  significant 
changes  whenever  they  acquire  proficiency  in  complex  tasks.  Further,  the 
rank  order  of  individuals  in  terms  of  their  relevant  skills  and  abilities 
does  not  remain  constant  during  the  process  of  skill  acquisition.  Some 
individuals  exhibit  greater  changes  than  others  in  the  abilities  related  to 
task  performance.  According  to  this  explanation  for  validity  decrements, 
the  set  of  abilities  required  for  task  performance  does  not  change;  the 
amounts  of  the  abilities  individuals  have  change.  Specifically,  the  amounts 
of  the  relevant  abilities  differ  from  early  to  late  in  performance  or 
learning.  Predictive  validities  of  late  performance  that  are  based  on  early 
ability  assessments  (i.e.,  those  taken  before  individuals  begin  a  Job  or 
practice  a  task)  are  low  because  ability  levels  assessed  before  performance 
has  started  may  be  only  moderately  related  to  the  ability  levels  individuals 
have  late  in  performance. 

A  competitive  test  of  these  two  explanations  in  terms  of  their  ability 
to  explain  the  decrement  in  predictive  validities  over  time  and  practice 
showed  that  a  number  of  the  hypotheses  based  on  the  changing  ability  levels 
explanation  were  supported  (Dunham,  197*1).  However,  that  explanation  was 
not  able  to  account  for  all.  of  the  observed  validity  decrement.  A 
postdictive  validity  sequence  consisting  of  the  correlations  between  ability 
tests  given  after  training  on  a  task  should  have  had  a  positive  slope  that 
mirrored  the  negative  slope  of  the  predictive  validity  sequence  based  on 
tests  given  before  practice  on  the  task.  Although  the  postdictive  validity 
for  periormance  on  the  final  trial  was  greater  than  the  predictive  validity 
for  the  final  trial,  it  was  not  as  high  as  the  predictive  validity  for  the 
initial  trial.  Similar  results  were  obtained  by  Alvares  and  Hulin  (1973). 
Dunham  (197*1)  concluded  that  there  was  no  empirical  evidence  supporting  the 
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explanation  based  on  changing  task  requirements;  there  was  support  for  the 
explanation  based  on  changing  subject  ability  levels  but  it  could  not 
explain  the  entire  validity  decrement. 

A  third  explanation  emphasizes  social  factors  in  task  performance  and 
skill  acquisition.  That  is,  individual  performance  is  hypothesized  to  be  a 
function  of  two  independent  factors:  relevant  abilities  and  the  ability  or 
skill  level  of  the  group  with  which  the  individual  is  competing,  learning 
the  skill,  or  performing  their  Job.  This  explanation  assumes  that 
individuals  know  the  average  performance  level  of  the  selected  group  of 
which  they  are  a  member.  Those  well  below  the  average  group  performance  on 
the  task  at  any  given  time  are  expected  to  increase  their  efforts  on  the  job 
or  task;  those  well  above  the  group  average  may  slacken  their  efforts 
relative  to  other  group  members.  Thus,  regression  to  the  mean  of  the 
selected  group  is  offered  as  an  explanation  for  validity  decrements  (L.G. 
Humphreys,  personal  communication). 

This  explanation  has  a  great  deal  of  appeal  for  explaining  within  group 
validity  decrements  that  occur  in  groups  that  interact  a  great  deal  during 
training  or  on  the  job.  Such  selected  groups  as  pilot  trainees,  law  school 
students,  and  employees  in  an  organization  have  this  characteristic.  Within 
group  competition  may  be  a  powerful  factor  in  influencing  group  members' 
performance  levels.  Other  "groups"  created  in  laboratory  studies  are  little 
more  than  collections  of  individuals  aggregated  for  purposes  of  data 
analyses.  Interactions  among  the  members  of  the  experimental  and  control 
groups  in  most  of  these  studies  are  minimal  or  nonexistent.  The  social 
competition  explanation  loses  much  of  its  intuitive  appeal  when  applied  to 
validity  decrements  observed  in  these  experimental  studies. 

Practical  Implications 


The  practical  implications  of  these  three  theoretical  explanations  for 
observed  validity  decrements  are  substantially  different.  The  first  two 
(ability-based)  explanations  suggest  that  both  predictive  validities  and  the 
practical  utility  of  selection  programs  decrease  over  time  and  are 
temporally  limited  to  early  performance  on  a  task  or  job  or  to  performance 
during  training.  If  abilities  required  for  late  performance  are  independent 
of  tho^.e  required  for  early  performance,  and  if,  as  our  results  suggest, 
nearly  all  commonly  assessed  abilities  are  those  that  are  required  for  early 
rather  than  late  performance,  then  both  the  within  group  predictive  validity 
and  utility  of  selection  programs  will  decrease  concomitantly.  If  abilities 
change  significantly  as  a  result  of  practice  on  the  task,  and  if  ability 
increments  at  time  j_  +  1  are  independent  of  ability  level  at  time  _i,  then 
after  extensive  practice  on  a  task,  ability  levels  should  be  nearly 
independent  of  ability  levels  used  to  select  individuals.  Extensive 
research  by  Humphreys  (I960)  a. id  Humphreys  and  Davey  ( 1984 )  has  suggested 
that  this  hypothesized  form  of  ability  change  cannot  be  rejected.  Matrices 
of  time  period  by  time  period  correlations  of  ability  levels  generally  show 
an  excellent  fit  to  a  simplex  matrix  (Guttman,  1955). 
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If  the  third  explanation,  based  on  social  competition  among  the  members 
of  the  selected  group,  is  correct,  it  sugge:,' .s  that  decrements  in  predictive 
validities  are  not  necessarily  related  to  decrements  in  the  utility  of  a 
selection  test  or  program.  As  long  as  the  regression  is  to  the  mean  of  the 
selected  group,  then  the  mean  of  the  selected  group  may  remain  above  the 
overall  performance  of  the  unselected  population  assuming  the  test  was 
initially  a  valid  predictor  of  performance  in  the  overall  population. 

The  changing  ability  level  explanation  offered  for  validity  decrements 
suggests  a  need  to  develop  theories  of  human  ability  and  human  performance 
that  incorporate  change.  That  is,  rather  than  relying  on  static  models  of 
human  ability  in  which  ability  levels  are  assumed  to  be  fixed,  dynamic 
models  should  be  developed.  These  models  would  allow  for  systematic  changes 
in  ability  level  as  a  function  of  learning,  or  practice  on,  a  complex  skill. 
Those  abilities  required  for  task  performance  might  be  assumed  to  change  as 
practice  continues.  Initial  ability  levels  could  be  used  to  predict  initial 
task  performance.  Performance  on  the  task  late  in  learning  is  assumed  to  be 
a  function  of  the  same  abilities  as  those  related  to  initial  performance. 
However,  either  updated  assessments  of  these  abilities  would  be  needed  to 
predict  later  performance,  or  initial  abilities  plus  a  set  of  factors 
related  to  changes  in  abilities  would  be  required  to  predict  late 
performance.  This  set  of  factors  related  to  change  in  ability  would  not 
necessarily  be  related  to  initial  performance.  Their  relation  to  late 
performance  would  be  through  their  effects  on  ability  levels  that  were 
changed  as  a  result  of  learning  a  complex  task.  The  outcome  of  such  a 
dynamic  theory  depends  on  identifying  and  assessing  the  set  of  individual  or 
individual/environmental  interaction  factors  related  to  ability  change. 

The  changing  task  explanation  for  validity  decrements  requires  a 
somewhat  different  strategy  by  researchers.  Instead  of  searching  for  a  set 
of  factors  that  are  related  to  ability  change  within  individuals,  this 
explanation  would  direct  us  to  search  for  a  set  of  abilities  that  are 
uniquely  related  to  late  performance  on  relevant  criterion  tasks. 

Regression  equations  predicting  performance  at  different  stages  of  practice 
would  be  characterized  by  a  gradual  decline  in  the  sizes  of  the  regression 
weights  assigned  to  "early"  abilities,  those  abilities  related  to  initial 
performance  levels,  and  a  gradual  increase  in  the  sizes  of  regression 
weights  assigned  to  "late"  abilities.  The  outcome  of  this  search  for  these 
"late"  abilities  is  an  empirical  question.  Past  work  by  Fleishman  (e.g., 
I960),  however,  does  not  provide  a  great  deal  of  encouragement  for  those 
interested  in  this  line  of  inquiry.  Ackerman  (1987),  however,  has  recently 
developed  a  theoretical  framework  consistent  with  this  approach  that  may 
offer  non-obvious  insights  and  promise.  However,  given  the  variety  of 
abilities  studied  as  predictors  of  performance,  any  conceptual  model  based 
on  unique  human  abilities  that  will  predict  late  performance  better  than 
early  performance  faces  long  odds  in  its  search  for  new  abilities. 
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